2 research outputs found
PM-FSM: Policies Modulating Finite State Machine for Robust Quadrupedal Locomotion
Deep reinforcement learning (deep RL) has emerged as an effective tool for
developing controllers for legged robots. However, vanilla deep RL often
requires a tremendous amount of training samples and is not feasible for
achieving robust behaviors. Instead, researchers have investigated a novel
policy architecture by incorporating human experts' knowledge, such as Policies
Modulating Trajectory Generators (PMTG). This architecture builds a recurrent
control loop by combining a parametric trajectory generator (TG) and a feedback
policy network to achieve more robust behaviors. To take advantage of human
experts' knowledge but eliminate time-consuming interactive teaching,
researchers have investigated a novel architecture, Policies Modulating
Trajectory Generators (PMTG), which builds a recurrent control loop by
combining a parametric trajectory generator (TG) and a feedback policy network
to achieve more robust behaviors using intuitive prior knowledge. In this work,
we propose Policies Modulating Finite State Machine (PM-FSM) by replacing TGs
with contact-aware finite state machines (FSM), which offer more flexible
control of each leg. Compared with the TGs, FSMs offer high-level management on
each leg motion generator and enable a flexible state arrangement, which makes
the learned behavior less vulnerable to unseen perturbations or challenging
terrains. This invention offers an explicit notion of contact events to the
policy to negotiate unexpected perturbations. We demonstrated that the proposed
architecture could achieve more robust behaviors in various scenarios, such as
challenging terrains or external perturbations, on both simulated and real
robots. The supplemental video can be found at: https://youtu.be/78cboMqTkJQ
Learning a Single Policy for Diverse Behaviors on a Quadrupedal Robot using Scalable Motion Imitation
Learning various motor skills for quadrupedal robots is a challenging problem
that requires careful design of task-specific mathematical models or reward
descriptions. In this work, we propose to learn a single capable policy using
deep reinforcement learning by imitating a large number of reference motions,
including walking, turning, pacing, jumping, sitting, and lying. On top of the
existing motion imitation framework, we first carefully design the observation
space, the action space, and the reward function to improve the scalability of
the learning as well as the robustness of the final policy. In addition, we
adopt a novel adaptive motion sampling (AMS) method, which maintains a balance
between successful and unsuccessful behaviors. This technique allows the
learning algorithm to focus on challenging motor skills and avoid catastrophic
forgetting. We demonstrate that the learned policy can exhibit diverse
behaviors in simulation by successfully tracking both the training dataset and
out-of-distribution trajectories. We also validate the importance of the
proposed learning formulation and the adaptive motion sampling scheme by
conducting experiments